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Abstract 

Motivated by recent efforts by the criminal justice system to treat and rehabilitate nonviolent offenders rather than focusing 
solely on their punishment, we introduce an evolutionary game theoretic model to study the effects of "carrot and stick" 
intervention programs on criminal recidivism. We use stochastic simulations to study the evolution of a population where 
individuals may commit crimes depending on their past history, surrounding environment and, in the case of recidivists, on 
any counseling, educational or training programs available to them after being punished for their previous crimes. These 
sociological factors are embodied by effective parameters that determine the decision making probabilities. Players may 
decide to permanently reform or continue engaging in criminal activity, eventually reaching a state where they are 
considered incorrigible. Depending on parameter choices, the outcome of the game is a society with a majority of virtuous, 
rehabilitated citizens or incorrigibles. Since total resources may be limited, we constrain the combined punishment and 
rehabilitation costs per crime to be fixed, so that increasing one effort will necessarily decrease the other. We find that the 
most successful strategy in reducing crime is to optimally allocate resources so that after being punished, criminals 
experience impactful intervention programs, especially during the first stages of their return to society. Excessively harsh or 
lenient punishments are less effective. We also develop a system of coupled ordinary differential equations with memory 
effects to give a qualitative description of our simulated societal dynamics. We discuss our findings and sociological 
implications. 
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Introduction 

The emergence of human cooperation is a subject of great 
interest within the behavioral sciences. In recent years several 
studies have tried to understand why such an exceptional level of 
cooperation among humans exists despite individual gains that 
may be attained if people acted selfishly. Some of the current 
hypotheses to explain large scale cooperation are based on player 
reciprocity, status, or altruistic and tit-for-tat behaviors between 
two actors [1-4]. One of the most endorsed theories however 
includes third party punishment, where defectors are punished for 
following their self-serving interests [5,6]. 

Game theory has often been used as a tool to explore human or 
animal behavior since its mathematical framework allows for the 
study of the dynamics of players and their choices in a systematic, 
albeit simplified, way. As a result, many authors within several 
disciplines have developed and analyzed games that include the 
effects of punishment as a way to foster cooperation among 
humans [7-9] . Most, but not all, of these studies are based on the 
classic prisoner's dilemma paradigm [10] and include elements 
such as the severity of sanctions and the willingness of participants 
to punish ofiFenders [11], the frequency and expectation of 
enforcement [12], collective punishment and rewards [13-15], 
network structures [16] and the possibility of directly harming 



adversaries [17- 19]. On the other hand, very litde work has 
focused on studying recidivism by offenders after punishment and 
how prevention measures - and not only punishment - taken by 
third parties may improve recidivism rates and affect cooperation. 

In this paper we focus on recidivism and rehabilitation within 
the specific context of criminal behavior, where cooperators are 
law abiding citizens and where defectors are criminals that may be 
punished if apprehended. We introduce a dynamic game-theoretic 
model to study how player choices change over time not only due 
to punishment after an offense, but also due to possible post- 
punishment intervention given by third parties as prevention 
against future crimes, in the form of housing, job, training or 
family assistance. In our "carrot vs. stick" game we start from 
non-offenders who are progressively exposed to opportunities to 
commit crimes. The probability of offending is dependent on 
external factors, such as societal pressure or the threat of 
punishment, and internal ones, such as the player's criminal 
history. Since we assume that repeat offenders are provided with 
assistance upon release, the probability to commit a crime also 
depends on the quality and duration of any previously assigned 
post-release assistance. Finally, to model the limited resources 
available to law enforcement agencies [20,21], we assume that the 
combination of punishment and post-release program costs per 
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incarceration are fixed: the more punishment a player is subject to, 
the less post-release intervention assistance he or she will receive. 

The rules of our game are chosen so that players will progress in 
their criminal careers as recidivists, until they are considered 
incorrigible, or choose to shun their criminal lives and become 
virtuous citizens. In this way, an initial society will evolve towards 
a final configuration comprised solely of incorrigibles or virtuous 
citizens. From a mathematical standpoint our evolutionary game 
win include history dependent strategies so that individuals placed 
in the same circumstances may choose different courses of action 
depending on their past crimes. Furthermore since each player's 
choices depend on the entire societal makeup, our model includes 
global interactions. 

We win analyze the ratio of the two final populations as a 
function of relevant parameters and show that under certain 
circumstances, post-release intervention programs, if structured to 
be long lasting, may have important consequences on the final 
societal makeup and be more effective than punishment alone. In 
particular, we will show that the ratio of incorrigibles to virtuous 
citizens may be optimized by properly balancing available 
resources between punishment and post-release assistance. 
Indeed, this is the main result of our paper: that punishment 
and assistance are effe(:ti\'e, complementary' tools in reducing 
crime, and that a judicious application of both wiU yield better 
results than focusing solely on either one. 

It is important to note that while several "carrot and stick" 
evolutionary games have been introduced in the context of public 
goods games [14,22,23], in most cases, the carrot and the stick are 
mutually exclusive. Players are either rewarded for their cooper- 
ative actions or punished for their selfish behavior, but not subject 
to both incentives and punishment at the same time. In our work 
instead, all criminal-defectors are subject first to the stick, via the 
punishment phase, and later to the carrot, in the rehabilitation 
phase. As mentioned above, if the total amount of resources to be 
spent on each criminal is finite, then the optimal way of reducing 
crime a balanced approach, where criminals are punished 
adequately while at the same time receiving enough incentives 
for rehabilitation. 

In the remainder of this Introduction, we motivate our work by 
including a brief discussion on recidivism and rehabilitation. In the 
Analysis section we introduce our dynamical game and justify the 
variable and parameter choices made to model societal trends. We 
present our numerical findings in the Results section where we also 
derive a set of coupled ordinary difierential equations with 
memory to describe the dynamics more succinctly. We show that 
the two approaches - simulations and solving coupled ordinary 
differential equations - lead to qualitatively similar results. We end 
with a brief Summary and Discussion where we discuss findings 
from our "carrot and stick" game and their sociological 
implications. 

Sociological background 

Starting from the 1970s, the severity of punishment for 
criminal offenses in the United States has been steadily 
increasing, as evidenced by growing incarceration rates, swelling 
prison populations, longer sentencing and the increasing 
popularity of mandatory minimum sentencing policies, such as 
"three strikes" laws[24,25]. At present, the United States has 
one of the highest incarceration rates in the western world, with 
about one percent of the population imprisoned at any given 
time [26]. The cost incurred by the taxpayer to fund the 
criminal justice system - including day to day expenditures, 
facility maintenance and construction, court proceedings, health 
care and welfare programs - is estimated to be a staggering $74 



billion for 2007 alone[27]. Related social problems include 
prison overcrowding and violence, racial inequities, broken 
families left behind, and releasing into the community individ- 
uals who have not been rehabilitated during their prison time 
and are ill-equipped to lead a crime free life after being released 
to the larger society. 

One of the prevailing schools of thought is that the severity, 
unpleasantness and social stigma of life in prison may serve as 
deterrents to future criminal behavior, promoting the principle 
that "crime does not pay"[28]. Opposing points of view contend 
that due to the mostly poor conditions within prisons and lack of 
opportunities for change, most inmates will be returned to society 
hardened and, having been exposed to an environment domi- 
nated by more experienced criminals, more savvy and likely to 
offend again. Indeed, several criminological studies have shown 
that harsher sentences do not necessarily act as deterrents and 
may even slightly increase the likelihood of offending [29-31]. 
On the other hand, social intervention and support combined 
with punishment and coercion have been shown to be effective in 
preventing crimes[32,33]. 

Recidivism rates in the United States vary depending on crime. 
In the case of property and drug related offenses, the likelihood of 
rearrest within three years after release is about 70 percent[29], 
higher than that of most western countries. In recent years thus, 
due to mounting incarceration costs and high recidivism rates, law 
enforcement and correction agencies have begun turning to novel 
approaches, designed to offer rehabilitation programs to prisoners 
during incarceration and assistance upon release. Such programs 
include counseling to increase self-restraint, drug treatment, 
vocational training, educational ser\ ices, housing and job assis- 
tance, community support, helping rekindle family ties, and even 
horticulture [34,35]. The issue is a multifaceted one and for former 
inmates, the question of whether or not to re-offend is a highly 
individual one that depends on their personal histories[29,36], 
their experiences while in jail, and the environment they are 
released to [29]. In general, the most successful intervention 
programs have been the ones that offered the most post-release 
assistance [3 7]. 

Analysis 

In this section we present the evolutionary game theory model 
we developed as inspired by the sociological observations 
described above. We consider a population of N individuals 
where each player carries his or her specific history of ^ = 0, 1 , . . . 
past offenses, whether punished or unpunished. Thus, at any time 
we have sub-populations of No,Ni, ■ ■ ■ ,Nic individuals with a 
record of past ^ > 0 crimes. 

We assume that when faced with the opportunity to commit a 
crime, players may decide to offend and transition from state Nk 
to Nk+i- If they choose not to commit a crime, they may either 
remain in state N/^ or choose to shun criminal activity altogether, 
for any and all future opportunities. Individuals who decide never 
to commit crimes again in the future, regardless of record and 
circumstances, are called paladins. Since paladin behavior is 
fixed, we take these individuals out of the game as active players 
and place them in the subpopulation P. Note that the difference 
between paladins P and players in the A'o subpopulation is that a 
paladin may have committed crimes in the past, but will not 
commit any crimes in the future, whereas an individual belonging 
to A^o has not committed any crimes yet, but may in the future, if 
the occasion presents itself. 

Upon committing crimes, players may or may not be arrested 
and punished. We assume that once a player has been arrested R 
times, he or she is considered incorrigible and incarcerated until 
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the end of the game, mimicking mandatory sentencing policies. 
Thus, after R arrests players are taken out of the game and placed 
into the pool of unreformables U. As a result, while players may 

transition between states A'^., states P and U act as sinks with 
paladins and unreformables no longer involved in the game as 
active participants. 

Finally, population conservation holds so that, at all times 



(1) 



Note that players may have committed k>R crimes before 
being arrested so that the summation over in Eq. 1 is in 
principle unbounded. 

For simplicity, we will consider an initial population of players 
with no criminal history so that initial conditions are set as 
No=N, and Nk>o=U = P = 0. We follow societal dynamics 
from the neutral state Nq towards subsequent states Ni^^q,U or P 
by assuming that when faced with the opportunity to commit a 
crime, players will decide to offend or not based on past history, 
apprehension likelihood, societal pressure, the threat of punish- 
ment but also, in case of recidivists, on possible forms of 
rehabilitation previously offered by society. As we shall later see, 
by construction, the game wiU end when all players are either 
paladins or unreformables, so that, eventually, P+U = N. A 
quantity of interest throughout this work will thus be the P/ U 
ratio, which we use as the final indicator of whether an ideal 
society is attained, with P/U»l, or whether instead a 
dysfunctional society emerges, with P/U—^0. Note that in 
principle we could consider an open-ended game where criminals 
are continuously exposed to crime opportunities to which they 
respond depending on their past history. In this case, however, 
we would need to define a specific measure to describe the degree 
of optimality of a society, to replace the P/U ratio. We choose to 
work with players irreversibly turning into paladins or unreform- 
ables since the P,U sinks naturally define P/U as a mathemat- 
ically straightforward order parameter. 

The game is played out in a succession of "rounds" r. At each of 
these rounds, an individual i is selected at random from any of the 
N/t pools. We assume the individual in the group has a history 
of kp punished and ku unpunished crimes, so that k = kp+ku. At 
each arrest and conviction the player is punished by an amount 9 
but also given educational and employment opportunities of 
magnitude h for a duration t. The dimensionless parameter 9 thus 
represents the stick of our game, while the parameters h,z describe 
the carrot. Since decisions made by an individual depend on past 
criminal record, we describe each each player via a string 
containing punishment status and round of crime occurrence. We 
label each convicted crime by 1 and each unpunished crime by 0. 
For example, if a player is in pool N-} this implies there have been 
3 crimes, committed at rounds rt where l<i<3. If we assume, 
say, that the first two crimes were left unpunished while the player 
was punished for the third one, the history string associated with 
individual / is ({ri,0},{r2,0},{?'3,l}). In this example kp = l and 
ku = 2. 

Individual i is now faced with the choice of whether to commit a 
new crime or not. We assume this occurs with probability /"crime 

given by 



/'crime — " 



Po+ku 



1 



2 \ ku + 9kp+po 



N 



(l-/!e-('-''iast)A). 



(2) 



We choose this form - given by the sum of two terms, multiplied 
by an attenuating factor - to embody the assumption that 
individuals commit crimes depending on their own personal 
history [36], represented by /),, and on the surrounding community 
imprint[38], represented by in equal manner. Given this crime 
propensity, we assume that the probability of committing a crime 
is finally modulated by the recidivism probability a,-, which 
includes any resources individusJ ( may have received in the past. 
In Eq. 2 we assume that if no crimes have been committed yet, 
''last — > — 00 so that, effectively, no resources have been assigned 
either. Note that at the onset of the game when Nk,ku,kp = 0, the 
overall probability to commit a crime is 1 /2, so that individuals are 
equally likely to offend or not. 

We now examine the terms in Eq. 2 in more detail. The first 
term />,- is the contribution to /"crime that strictly depends on the 
player's past history [36], given by 



Po + ku 
ku + 9kp+po ' 



(3) 



The form for the "stick" is chosen such that previous 
unpunished crimes ku embolden the criminal, since /), is an 
increasing function of Similarly, previously punished crimes 
will hinder the likelihood of future offenses, since pi is decreasing 
in 9kp. Note that />, <1 only when 9kp>0: if 6 = 0 there are no 
consequences for committing crimes and players will always 
inherenfly want to offend, if kp = 0 the criminal was never 
punished and feels emboldened by the impunity. The intrinsic 
crime probability /),■ increases with />o for ail values oiku,kp,9. The 
parameter /)o is also a measure of how sensitive /»,• is to punishment 
after the first crime and apprehension. Consider the case 
ku = Q,kp = \. Upon differentiating />,■ with respect to 9 and setting 
0 = 0 one finds 



P.dB 



_ 1 

0 = 0 



(4) 



so that larger values of po represent a smaller sensitivity to the 9 
punishment. The next term in Eq. 2 is which represents a 
societal pressure term which we model by 



N 



(5) 



Including Si m /^crime allows us to incorporate the assumption 
that crimes will generate more crimes, either by imitation, or by 
observed degradation of the community [38]. On the other hand, 
if the community is mostiy comprised of virtuous P or neutral 
citizens No, the societal pressure term is very small and so is the 
probability of committing crimes. In the limit of P— >JV, ^,-^0. 

Finally, the sum (/i, +.s,)/2 is attenuated by the factor a,- due to 
societal intervention evaluated at the last round player i committed 
a crime. We model the effect of the "carrot" by the functional 



PLOS ONE I www.plosone.org 



3 



January 2014 | Volume 9 | Issue 1 | e85531 



Recidivism and Rehabilitation of Criminals 



form 



ar- 



(6) 



where riast denotes the round number at which the last punished 
crime occurred. This term represents intervention and help from 
third parties, such as helping individual i with employment, 
education opportunities, or, in the case of youth, the support of a 
mentor. We assume that these assistance programs are imple- 
mented with intensity h and decrease in time over a period T. 
Note, from Eq. 6, that if T«r — n^st and rehabilitation programs 
are short lived, the exponent tends to zero, fl/ approaches 1, and 
there is no attenuation effect. On the other hand, if T»f — Hast, the 
attenuation is most effective at l—h. We assume 0<A<1. In 
principle, we could also let both h and T depend on crime number 
kp, but for simplicity we will keep them constant for the remainder 
of this work. 

After player / is faced with the opportunity to commit a crime, 
the game proceeds depending on the choices made. If the crime 
was not committed, the game proceeds to the strategy change 
phase; otherwise an apprehension and punishment phase play out. 
We assume that the apprehension and punishment probability is a, 
and that the odds of being arrested are known to criminals. We 
also assign resources /z,T to a criminal every time he or she is 
arrested, regardless of their criminal past history. The final step of 
the game is for player / in population Nic to update his or her 
strategy. We start with the possibility that the player has 
committed no crimes; in this case, he or she will either proceed 
to the paladin pool P with probability 



.Preform ■ 



ccP 



(7) 



or remain in the current subpopulation Nk with probability 
1 —/'reform- The underlying idea is that we assume player i wiU 
commit to turning his or life around after having been "tempted" 
and not having caved in to crime. We further assume this decision 
depends on societal imprint expressed by the proportion of 
virtuous citizens, P/N and modulated by ot, the probability of an 
arrest. 

If player i committed a crime but was not apprehended, he or 
she moves from pool N/c to pool A't+i with probability 1. In this 
case, since there were no consequences for having committed 
crimes, we assume players likewise have no incentives not to 
commit criminal actions in the future. The last case is when a 
crime was committed, the criminal was apprehended and 
incentives for rehabilitation were assigned. Under this scenario, 
we assume that the criminal decides to turn into a law-abiding 
citizen and join the paladin pool P via the probability 



/'reform " 



hoiP 

Iv' ' 



ekp + k^+po 



(8) 



while he or she wiU join the population A'^^+i with probability 
(1 —/"reform)- In Eq. 8 wc assume that the reform probability 
depends both on societal imprint and on the player's punishment 
history. In particular, if no resources or punishment are offered 
and both h = 6 = 0 there is no incentive for players to reform. Note 
that if a player committed a crime during round r, the kp to be 
utihzed when evaluating /"reform is the same at the onset of the 
round, augmented by one. For all parameter values /"reform^ 1- 

FinaUy, we assume that when players are arrested R times they 
are considered incorrigible and are sentenced to lengthy incar- 



ceration periods that effectively take them out of the game and into 
the unreformable pool U. These players act only as bystanders 
and yield a negative imprint to society, just as paladins do but in a 
positive manner. By construction, our game wiU end when aU 
players are either in subpopulation P or (7. A majority of paladins 
represents a desirable,"utopian" society and a majority of 
unreformables an undesirable, "dystopian" one. 

To summarize, the parameter space associated with our model 
consists of six quantities {h,T:,6,po,a,R}. However, consistent with 
police estimates [39], we set the apprehension and punishment rate 
a = 1 /4 and we fix /^ = 3 as the maximum number of punished 
crimes before players join the pool of unreformables U. Thus, in 
the remainder of this work we only consider only the parameter set 
{h,T,9,po}. All parameters and variables of interest are summa- 
rized in Table 1. 

Methods 

While statistical methods have been routinely used in the 
quantitative study of crime[40,41], game theory approaches are a 
relatively new contribution. On the other hand, there is a quite 
rich literature on Monte Carlo methods for simulating games that 
involve decision making and strategy updating[42]. In this work, 
we implement our criminal game as a Monte Carlo simulation 
where we track the behavior of each individual over the duration 
of the game and where each round is a discrete time step. As 
mentioned in the previous section, a dynamic history string that 
summarizes past crime and arrest occurrences are assigned to each 
individual. For these, we evaluate transition probabilities between 
possible subpopulations Nk,P,U every time a decision process is 
involved. 

At every round we select a random player within any of the Nk 
subpopulations and present him or her with the opportunity to 
commit a crime, evaluating /icrime and /"reform to inform decisions 
and strategy updates. We repeat this procedure for all iV — (7 — P 
players and update the resulting Nii,P,U subpopulations only after 
the decision process has been carried out for all players, consistent 
with parallel update discrete time Monte Carlo methods [42]. We 
also calculate relevant crime, punishment and recidivism statistics 
at each round, until the end of the game, when all players are 
either in the U or P subpopulations. Finally, we generate contours 



Table 1 


. List of subpopulations and of relevant parameters. 




p 


paladins 


u 


unreformables (who have have been punished R times) 


No 


number of persons that have committed no crimes 


Nk 


number of persons that have committed k = ku+kp crimes 


ku 


number of unpunished crimes per person 


kp 


number of punished crimes per person 


h 


effective resource parameter 


T 


duration of assistance 


9 severity of punishment 




punishment sensitivity 


a 


arrest and conviction probability 


R 


maximum number of punished crimes 


We set o:=l/4 and R = ?> throughout this work. 
doi:10.1371/journal.pone.0085531.t001 
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of the ratio P/ U at the end of the game, which descriijes how ideal 
the outcome society is, given the parameters {h,i,6,po}. 

Within our work, the average total number of crimes per player 
is evaluated as the sum of migrations between subpopulations 
Nk^Nj^j^i for fc = 0,l,2, ■ ■ ■ ,R— 1 over the course of the game, 
normalized by the total number of players N. Similarly, the 
average total number of punishments per player is defined as the 
sum over increments of kp over the entire course of the game 
normalized by N. Finally, the a\-cr;it;c recidivist rate is the sum 
over increments of kp>\ normalizc-d by the total number of 
criminals who have been punished at least once [29]. In the Results 
section, we investigate how all of the above quantities vary with the 
model parameters {/!,t,(?,/»o} for a set of A'^ = 400 individuals. To 
limit the space defined by our four parameter model we limit t < 3 
and consider only representative values oi pa, since - as we shall 
see - our results are monotonic in po. Parameters h,6 instead will 
be chosen as Q<h,6<\, which are limitations imposed by the 
model. 

For each criminal conviction, the justice system wUl impose an 
amount 9 of punishment to the offender and an amount h, over an 

efFc( ti\c period T, for rehabihtation and assistance, yielding a total 
rehabilitation cost of At. The latter is estimated by considering all 
resources on rehabilitation spent from the moment of arrest when 
'' = nast until the end of the game at r->GO, and using a continuum 
approximation so that 

h e-'l^dt=hx. (9) 
Jo 

Since law enforcement may have limited total available 
resources c to both punish and rehabilitate a criminal we 
introduce the constraint hx-\-d = c. Higher punishment levels 0 
thus translate to lower rehabilitation efforts h%, and viceversa. We 
win often invoke this constraint when examining the variation of 
derived quantities with respect to h,d. 

Results 

In this Section we show and discuss results from our Monte 
Carlo simulations for different parameter choices. As discussed 
above, in analyzing our data we will use the resource constraint 
hT + 6 = c. Note that the total number of crimes k committed by a 
player can increase but not decrease, so that the dynamics is 
irreversible. We thus expect to find final configurations that 
depend on our specific initial conditions. 

Population dynamics 

Since our game is constructed to evolve towards a final 
configuration where all players are either in subpopulation P or U, 
we follow the time evolution of the number of players in these 
states over the duration of the game. In Fig. 1 we show the 
dynamics of P and U as the game progresses for various choices of 
h,6 when /7o = 0.1 and T = 2. All curves are truncated at r~100, 
when P+U = N and the game ends. We use initial conditions 
No=N = m and Nk>o=0 (/Co) and A'o = A'i =A'/2 = 200 and 

> 1 = 0 (/C| ) to investigate the effects of different starting 
choices. We let ku=l and kp = 0 for all A^i individuals within ICi 
so that all players start the game without having been punished. 

In Figs. 1(a) and (b) no resources are utilized for rehabilitation 
programs {h = Q). The punishment level is set to the low value 
6 = 0.04 in panel (a), yielding a large number of unreformables for 
both sets of initial conditions, while for the higher punishment 
choice 0 = 0.8 in panel (b) we find that the number of paladins 



exceeds that of unreformables U. Note- th(- slightly difiFerent 
behaviors for the two sets of initial conditions in panel (b): within 
ICi the initial society includes individuals with a criminal past at 
/ = 0, and the final number of paladins is greater than for initial 
conditions /Co where all citizens started out in the neutral state. 
This difference arises because of the following. At the onset of the 
game Ni for ICi is greater than A'^i for I Co; due to the structure of 
/"crime > more crimes will be initially committed for ICi than for 
/Co. The high value of 6 in /"reform wUl lead players who are 
arrested to more likely reform, increasing the number of paladins 
and decreasing /Jcrime ■ This leads to a feedback loop that effectively 
keeps increasing the number of paladins throughout the game and 
that is larger for ICi than for /Cq due to the initial conditions. 

In Figs. 1(c) and (d) we keep the punishment levels equal to those 
used in panels (a) and (b) respectively but include the assignment of 
resources /! = 0.8 over a time T = 2. As shown in Figs. 1(c) and (d) 
adding resources dramatically increases the final number of 
paladins. The behavior in panel (c), where there are a large 
amount of resources but Uttie punishment, is interesting: within 
/Co the number of paladins at the end of the game is greater than 
that of unreformables, but within ICi the opposite holds, showing 
the importance of initial condition choices. In particular, within 
ICi , the initial presence of a large cohort of players with a criminal 
past leads to a feedback loop where more crimes are encouraged 
since punishment is low, leading to a large U population. This 
effect is less pronounced within /Co where players all start in the 
neutral state. 

In Figs. 1(e) and (f) we keep the same total amount of resources 
as in Fig. 1(c), hT + d= 1.64, but use a different realization of the 
constraint: in panel (e) we allow for fewer resources A = 0.6,T = 2 
and more punishment 0 = 0.44 while in panel (f) we decrease the 
amount of resources even more, with h = QA,x = 2 and 0 = 0.84. 
Given the hr + O =1.64 constraint, a comparison of panels (c), (e) 
and (f) shows that the relative numlx-r of paladins with respect to 
unreformables can be maximized by optimally modulating the 
parameter subset {h,9}. In particular for /Co, out of the three 
panels (c), (e), (f) examined, the parameter choice in (e), with the 
optimal balance of punishment and rehabilitation efforts, is the 
most effective in yielding the largest final P/ U ratio. On the other 
hand, for IC[, panel (f) yields optimal results. We will later explore 
parameter space more in detail and study the final P/ U ratio over 
a wider range of {h,6} values. 

Finally, in all panels of Fig. 1, we observe a slight delay in the 
increase in U compared to the initial dynamics of P. This is 
because player reform may occur from the Ix'ginning of the game, 
while for an individual to join the U subpopulation he or she must 
have committed at least R = 3 crimes. 

Correlations between po and h 

In this subsection we investigate the role of po on the final value 
of the P/ U ratio. Since po appears only in Eq. 2, and berime is an 
increasing function of po, we expect all results to be similarly 
increasing in this parameter. In Fig. 2, we plot contours of the final 
P/U ratio as a function of po and h for t = 2 and 0 = 0.1 using 
initial conditions ICq. As expected, the final P/U ratio increases 
both in Po and h. In Fig. 2 we have also highlighted the {h,po} 
curve where the ratio P/U =1. Note that for higher values of po, 
where Pcrimc is higher, more incentives for rehabilitation h are 
needed to yield a final society comprised of equal numbers of 
paladins and unreformables. In this case, introducing the total 
resource constraint hT + 9 = c is equivalent to selecting slices in 
Fig. 2 at fixed h. The resulting trend is clear: for frxed h better 
results are obtained on a low po population, where the intrinsic 
probability />, to commit crimes is lower. All other quantities of 
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Figure 1. Dynamics of Paladins P and Unreformables V for pe) = ^.\,x = l and variable hfi under initial conditions /Co where 

iV = A'o =400 and IC\ where Ng^Ni =200. (a) For /i = 0,0 = 0.04, F» U, since punishment is low and no post-release resources are allocated. For 
IC\ the number of unreformables is slightly higher than for /Q. (b) For /; = 0,0 = 0.8, higher punishment leads to a deterrence effect and P^V . This 
trend is more evident for /Q as explained in the text, (c) For /; = 0.8,0 = 0.04, P~ U. (d) For /; = 0.8,0 = 0.8 higher punishment leads to P>U. (e), (f) 
Dynamics under the constraint /7t + 0=1.64 as in panel (c). 
doi:10.1371/journal.pone.0085531.g001 



interest yield similar monotonic trends - namely, the crime, 
punishment and recidivism rates are decreasing firnctions of 
{/i,/7o} and we do not show them here. Similar considerations 
apply to initial conditions IC\. 

Correlations between 0 and h 

In this subsection we study how all quantities of interest vary 
within the parameter space for initial conditions /Co, and 

for Pq = Q.\,t = 2. Qualitatively similar results arise for different 
values Pa, so we keep this parameter fixed. In Fig. 3(a) we show 
that the final P/U ratio increases with both hfi while the total 
number of crimes and punishments and the recidivism rates in 
Figs. 3(b), (c) and (d), respectively, decrease with h,d. These are 
predictable trends since increases in both rehabilitation and 
punishment tend to drive overall crime down. In particular, note 
that punishment per player values in panel (c) are approximately 
one-fourth of the crimes per player, shown in panel (b). This is 
expected, since the punishment probability is given by a = 1 /4. 



We now introduce the constraint At + 6 = c. In particular, in 
Fig. 4(b), we show the final P/ U ratio as a function of h on the 
locus hz-\-6 = c for 6 = 2,Pq = 0.1 to mirror the parameter choices 
made in Fig. 3. The three curves are for the constant set at 
c = 0.8,0.6,0.4, so that higher constants yield higher P/U values at 
the end of the game. The most interesting feature we observe is 
that optimal values of h and d = c — hx exist that yield maxima in 
the final P/ U ratio. This implies, as mentioned earlier, that if law 
enforcement agencies have limited resources at their disposal to 
both punish and rehabilitate criminals, a proper balancing of these 
efforts may yield the best outcome in crime abatement. 
Furthermore, note that for small values of h, when 9 is high, 
increasing the levels of rehabilitation h is beneficial, but that 
beyond a certain threshold, when h is too large and little 
punishment 6 is assigned to criminals, the final ratio P/U starts 
decreasing, implying that both punishment and rehabilitation are 
necessary. While a similar behavior is found in Fig. 4(c) for T = 2 
different trends are observed in Figs. 4(a) and (d) where T = 1 and 
T = 2.5 respectively. In the latter cases, the final ratio P/U does 
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Figure 2. Contours of the final F/U ratio as a function of;;o and /' for 0 = 0.1, and r = 2 for ICg. Note that the final F/if ratio is an increasing 
function of po and /). The solid curve marks the locus F=U . 
doi:1 0.1 371 /journal.pone.0085531 .g002 
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Figure 3. Contours of tKie final values of (a) the F/ U ratio, (b) the number of crimes per player, (c) the number of punishments per 
player and (d) the recidivism rate as a function of lifi for pa = 0. 1 and i = 2. Initial conditions /Co are chosen so that at the onset of the game 

Afo = A? = 400, and M>o = 0. 

doi:1 0.1 371/journal.pone.0085531 .g003 
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Figure 4. The final P/U ratio plotted as a function of h under the constraint hx + 0 = c, for (a) t = 1 (b) t = 1.5 (c) t = 2 and (d) t = 2.5. The 
constant is chosen as c = 0.4,0.6,0.8 so that three curves are shown for each each value of t. Each curve terminates at 0 = 0. Panel (b) is projected from 
Fig. 3(a). Note that for all c values, the most efficient allocation of resources is attained for the intermediate t= 1.5. In particular, for c = 0.8 the final 
P/U ratio is attained at t=1.5, /! = 0.3 and 0 = 0.35 as shown in panel (b). Also note the emergence of maxima in panels (b) and (c). 
doi:10.1371/journal.pone.0085531.g004 



not change appreciably as h increases for low h. We notice instead 
a quasi-plateau regime, where increasing h and decreasing 
6 = c — hz does not significantly affect the final P/U ratio. 
However, increasing h and decreasing 6 further leads to decreases 
in the final P/ U ratio: just as in Figs. 4(b) and (c) a threshold 
punishment level 9 is necessary to keep P/U> \ at the end of the 
game. Overall, the largest P/U ratio is attained for T=1.5, h = 3 
and 9 = 0.35, when the number of paladins is double that of 
unreformables. 

Within the context of our model we find that if rehabilitation 
efforts are either too short or too long-lived they may be 
ineffective: in the first case because they do not last long enough 
to affect the criminal decision process, in the second case because 
long intervention programs with finite resources necessarily imply 
that these programs are not impactful enough and wiU have 
marginal effects on crime rates. Our findings imply that the best 
approach to minimize the final P/U ratio is to punish the criminal 
adequately while leaving enough resources to be used over an 
intermediate period of time towards the criminal's rehabilitation. 

This trend is confirmed in Fig. 5, where we plot contours 
corresponding to P/[/= 1 at the end of the game in {h,6} space for 
various values of T and for/iQ = 0. 1 . Note that rehabilitation programs 
lasting for intermediate times T = 1 . 5 yield the lowest lying curve, 
indicating that equal numbers of paladins and unreformables can be 
attained for lower resource h and punishment 6 if intervention 
programs are neither too stretched out in time, nor too short. 



In Fig. 6 we plot the number of crimes per player throughout 
the game as a function of /i for the same parameters, po=0.\, 
T= 1,1.5,2,2.5 and the same constraints used in Fig. 4. As can be 
seen from panels (b) and (c) a minimum in the crime rates may 
arise depending on parameter choices, partly mirroring the results 
found in Fig. 4. Note that for t = 2 there is no minimum in the 
crime per player curves, which instead arises within the P/ U plots. 
Similar trends may be found for the total number of punishments 
per player and for the recidivism rates. Together with our findings 
for the final P/U ratio, these results show that the occurrence of 
crime can be mitigated by properly balancing the partitioning of 
resources between punishment and rehabilitation. Finally, in Fig. 7 
we show the equivalent results of Fig. 3 for the case of ICi . Note 
that although quantitatively different, the main features are similar 
from those obtained using ICq. 

ODE-s corresponding to the model 

In order to obtain a qualitative description of the model, we 
formulate the dynamics in terms of ordinary differential equations 
(ODEs) for the relevant subpopulations. These "mass-action" type 
ODEs implicifly correspond to random sequential updating and 
are not expected to match exactly our simulation results, obtained 
using parallel update dynamics. Nonetheless, we expect such an 
approach to yield qualitatively valid results, with significantly less 
computational effort. Due to the complexity of the game and to 
history-dependence events, the dynamics cannot be reduced to a 
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(13) 



For P.U we write 



ku=Okp=0 



NkuJcp 



oo R-2 

E E Cku,kp''ku,kp + \^ku,kp 
ku=Okp=0 



(14) 



Figure 5. Curves along which at the end of the game P/U = \ 
for different values of t. Given i, the area to the right of each curve 
corresponds to values of li,0 where P>U and the area to the left of 
each curve corresponds to values of h,0 where P<U. The curve for 
T = 2 is projected from Fig. 3(a). When no rehabilitation resources are 
assigned (/7 = 0) t does not play a role so curves intersect at the same 
value of 0. Note that the P/ U=l curve is lowest for t = 1.5, implying 
that for given h,0 the best way to populate society with an equal 
amount of paladins and unreformables is by selecting an intermediate 
value for t. As explained in the text, intervention programs that are too 
brief or too long long yield less efficient results. 
doi:10.1371/journal.pone.0085531.g005 

set of equations describing the time evolution oiNk, the number of 
players that have committed k crimes. Instead, we must keep track 
of how many crimes were punished and how many were not, 
leading to an expanded population. We thus introduce Nk„,kp{t) as 
the number of individuals who have committed A:,, unpunished 
crimes and kp punished ones until time / and study its evolution 
towards states with increasing /f„ or kp or towards the two possible 
sinks, P{t) or U{t). We choose to measure time in units of a single 
simulation update so that all probabilities used in our simulation 
rounds may be recast as rates per unit time. For notational 
simplicity we set Pa\me^Ck,„kp, Prc{orm^rk,„k„- The mass-action 
rate equations can be expressed as 



No. 



co,o + (l-co,o) 



(10) 



aP 



Nn 



CO,kp+{i-Co,kp)-^ 

+ a-ca^kp - 1 ( 1 - ra^kp )Na,kp - 1 , 
forkp =1, - ^R-l 



(11) 



Nkufi-- 



CA-„,0+(1-CA:„,o) 



+ CA-„-l,o(l-0!)A^/t„-l,0, 

forku > 1 



(12) 



[/ = a ^ Ck„R-iNku.R- 



(15) 



where the k„ index and summations are unbounded. In the above 
equations, Ck^.kp aiid rk^.kp are derived directiy from Eqs. 2 and 8 
respectively 



Cku,kp(r) = 

Po + ku 



E 



Po + ku + 9kp 



+ - 



{k,.kp^o,o} N 



N 



(^l_/,e-('-'last)A') 



haP 



ek. 



N ekp+k,+po 



(16) 



It can be easily verified that population conservation holds, 
since 



E E^'^".v+^+t^=0' 

ku=Okp=0 



(17) 



for all times. Note that the dynamics contained in Eqs. 10-16 are 
irreversible. If we take the ?— >go limit in Eqs. 10-16, we find 
Nk^^kp(oo) = 0 for all ku,kp and P(ca)+ U((X)) = N, but no 
independent constraint on P(co) or [/(oo). The ratio 
P(co)/[/(co) therefore, needs to be determined from the evolution 
of the dynamics and the specific initial conditions. 

In order to numerically integrate Eqs. 10-16 we must first 
approximate / — /last in Eq. 16. Note that for players committing 
their k'^ crime at time t, there is z. t/kp interval between arrests, so 
that we can reasonably assume / — /jast — t/kp. As in our numerical 
simulations, it kp = 0, /— /last"*"^, and there is no attenuation 
effect since no resources have been assigned to players who have 
never been punished. Since we are deriving continuous ODE-s 
starting from parallel update Monte Carlo simulations, an effective 
t' in Eq. 16 is required, which we estimate to be of the order of 
~ IOt. The rescaled t' = IOt will largely compensate the difference 
between our parallel update simulations and the sequential update 
in the ODEs. Finally note that Eqs. 10-16 form an infinite set 
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Figure 6. The number of crimes per player over tlie course of a game plotted as a function of h under the constraint ln + 0 = c, for (a) 

T = 1 (b) T= 1.5 (c) T = 2 and (d) t = 2.5. The constant is chosen as c = 0.4,0.6,0.8 so that three curves are shown for each each value of t. Each curve 
terminates at 0 = 0. Note that a minimum arises in the case of t = 1 .5 indicating that an optimal allocation of rehabilitation and punishment resources 
exists to minimize crime occurrences. 
doi:10.1371/journal.pone.0085531.g006 



because ku may grow indefinitely. Thus, in order to numerically 
implement our ODE system, an appropriate truncation scheme is 
necessary. We assume that for large enough ki, = k*, players join 
the pool of 'uncatchable' criminals, truncating the A^jt„,fc, hierarchy 
a^t Nki,R- 1 . Our effective system is now made of (A:* + coupled 
equations in addition to the two sink equations for P,U and a 
closure equation for A*'u„catch that can be written as follows 

R-\ 

iVuncatch = (l-a) Ck*,kpNk*,kp- (18) 

kp=0 

The truncation scheme described above should not lead to large 
discrepancies with our simulations if A* is sufiiciendy large, since the 
likelihood of being neither arrested nor of reforming - and thus 
ending in either the U or P sinks - is small. We set k* = 23 and 
verified in all cases that only a handful of players are able to reach 
the "uncatchable" status. We also verified that slightly smaller 
choices of k* = 20,21 ,22 essentially lead to the same results. In Fig. 8 
we plot the dynamics obtained from our set of coupled ODEs 
under the /Co initial conditions, when A'o,o = 400 and 
P= C/ = A'{4^^/t^}^{o,o} =0. As can be seen, the agreement with 
our simulation results in Fig. 1 is very good. A similar quali- 
tative agreement holds for /Ci, where A^o.o = ^^1,0 = 200 and 
P= U = A'^{/t,„<r,}7t{o,o},{i,o} =0 and which we do not .show here. 



Summary and Discussion 

We have proposed an evolutionary game that incorporates both 
punishment - the "stick" - and assistance - the "carrot" - to study 
the effects of punishment and rehabilitation on crime within a 
model society of = ^j,, N/^- + P + U individuals. At every round, 
each of the Nk players that have committed k crimes may 
reoffend, and join the A'*:+i pool, or choose not to reoffend and 
remain in the Nk pool. We also allow players within Nk that 
choose not to reoffend to join the paladin pool P of players that 
will not commit any more crimes in the future. Finally, upon being 
arrested R times, players join the pool of unreformables U. Within 
this context, the index k also represents how hardened or 
experienced the criminal has become. 

Our model was studied via Monte Carlo simulations and via an 
approximate system of ODEs. From both approaches we find that 
increasing the severity of punishment as well as the magnitude and 
time duration of intervention programs yield lower incidents of 
crime and recidivism rates. Since in realistic scenarios total 
resources available to law enforcement may be finite, we also 
include a constraint c = hi-\-6 on the total punishment d - the 
stick of our game - and on the rehabilitation resources h,z - the 
carrot of our game - so that increasing one effort will necessarily 
decrease the other. We find that an optimal allocation of resources 
may exist to minimize recidivism and crime rates, reinforcing the 
emerging viewpoint that a mixture of sufficient punishment and 



PLOS ONE I www.plosone.org 



10 



January 2014 | Volume 9 | Issue 1 | e85531 



Recidivism and Rehabilitation of Criminals 



^ 0.4 




(b) Crimes per Player 



^ 0.4 




(c) Punishment per Player 



^ 0.4 




Recidivism Rate 



^ 0.4 




"•8.0 0.2 0.4 



Figure 7. Contours of the final values of (a) the P/ U ratio, (b) the number of crimes per player, (c) the number of punishments per 
player and (d) the recidivism rate as a function of h,0 for /7o = 0. 1 and i = 2. Initial conditions IC\ are chosen so that at the onset of the game 
No = N\ =jV/2 = 200, A'«:>i = 0 and all players within Ni are assigned /c„ = 1 and kp = 0. Note that while qualitative trends mirror the results shown in 
Fig. 3 for /Co, there are quantitative differences between the two different initial conditions. 
doi:10.1371/journal.pone.0085531.g007 



long— lasting assistance efforts upon release may be the most 
effective way to reduce crime. 

From a mathematical point of view, the continuum ODEs we 
derived correspond to random sequential updating processes, 
rather than to the parallel updating schemes used in our 
simulations. We have shown that by considering rescaled time 
scales, and for some parameter regimes, results from the ODEs we 
derived are qualitatively similar to the simulated ones. However it 
would be mathematically interesting to derive the corresponding 
continuum equations directly from our parallel updated simula- 
tions and compare how they differ from the current ODEs. 

Several "carrot and stick" evolutionary games and experimental 
studies have been presented in the literature, especially in the 
context of public goods games [14,23,43,44]. In most cases, 
cooperators are rewarded with incentives and defectors punished, 
and in some instances players have the extra option of non- 
participating [14]. A common finding is that, to varying degrees, 
incentives promote cooperation [22,45,46], with punishments 
further enhancing the level of cooperation among players [15]. 
Our work differs from the above scenarios in that instead of 
assigning punishments or rewards to players depending on their 
cooperative or defective behaviors, we both punish and rehabil- 
itate defectors, so that their carrot and stick experiences are not 
mutually exclusive and that any player's future behavior depends 
both on how much each he or she was punished and on the quality 



and duration of incentives for rehabilitation he or she has received. 
Although the way we assign incentives and punishments differs 
from standard "carrot and stick" games, our results confirm that 
punishment and rewards complement each other and that both 
tools should be used by law enforcement to reduce recidivism. 

Within our work, rehabilitation resources were specified via the 
collective parameter h. However, various rehabilitation opportu- 
nities are possible - in the form of educational or vocational 
training, behavioral treatments, or fostering family relationships. 
Each of these comes with possible modeling opportunities and 
challenges that are beyond the scope of this work. We have also 
made numerous assumptions in our work by neglecting effects of 
heterogeneity in age, race, gender or other socio-economic or 
geographical considerations on berime and PreCorm- We have 
assumed all-to-all couplings between players so that each 
individual's choices depend on the entire society. The introduction 
of a dynamical network where each individual is linked to friends, 
family and employers that selectively influence each player's 
decisions, could represent a more realistic approach. Finally, we 
have kept the arrest probability a frxed and assumed that 
rehabilitation efforts were assigned to aU players, with a fixed 
magnitude and time duration, regardless of the player's history 
and have not included incarceration periods between crime events. 
Including all these refinements would add more complexity to the 
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Figure 8. Dynamics of Paladins Fit) and Unreformables U{t) according to the ODEs in Eqs. 10-16 for ;7o = 0.1, r' ^20 and variable h,6 
under initial conditions /Co where A^^A^o^400. Note that the dynamics are qualitatively similar to the simulation results shown in Fig. 1. 
doi:10.1371/journal.pone.0085531.g008 



underlying model; whether and how they may change our results 
will be the subject of future investigation. 
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